An Interval Tree Based Feature Reduction Method For Cancer Classification Using High-Throughput DNA Copy Number Data
نویسندگان
چکیده
Cancer classification using DNA copy number data is an important bioinformatics problem. Effective machine learning models for this task can be useful not only for cancer diagnosis, but also for discovering novel tumor suppressor genes and oncogenes. The recent array-based assays that detect DNA copy numbers contain very large numbers of probes and thus generate data of extremely high dimensionality. Therefore, the use of appropriate feature reduction methods is called for. In this paper, we proposed an efficient interval tree based feature reduction method for cancer classification using DNA copy number data. Instead of using probes as features, our approach extracts intervals as features from the original probe data. Experiment results on two real data sets showed that our approach led to statistically significantly better classification accuracies as compared to the based line approach where the DNA copy number data at probe loci were used as features directly.
منابع مشابه
Feature Selection and Classification of Microarray Gene Expression Data of Ovarian Carcinoma Patients using Weighted Voting Support Vector Machine
We can reach by DNA microarray gene expression to such wealth of information with thousands of variables (genes). Analysis of this information can show genetic reasons of disease and tumor differences. In this study we try to reduce high-dimensional data by statistical method to select valuable genes with high impact as biomarkers and then classify ovarian tumor based on gene expression data of...
متن کاملAneuploidy prediction and tumor classification with heterogeneous hidden conditional random fields
MOTIVATION The heterogeneity of cancer cannot always be recognized by tumor morphology, but may be reflected by the underlying genetic aberrations. Array comparative genome hybridization (array-CGH) methods provide high-throughput data on genetic copy numbers, but determining the clinically relevant copy number changes remains a challenge. Conventional classification methods for linking recurre...
متن کاملEnsemble Classification and Extended Feature Selection for Credit Card Fraud Detection
Due to the rise of technology, the possibility of fraud in different areas such as banking has been increased. Credit card fraud is a crucial problem in banking and its danger is over increasing. This paper proposes an advanced data mining method, considering both feature selection and decision cost for accuracy enhancement of credit card fraud detection. After selecting the best and most effec...
متن کاملAssessment of mitochondrial DNA copy number in peripheral blood leukocyte of opiate abusers and healthy individuals
Background: Based on the studies, variation in the mitochondrial DNA (mtDNA) copy number in peripheral blood leukocytes is associated with increased susceptibility to diseases including cancer. Opiate abusers are at high risk for diseases. In this study, we measured the mtDNA copy number in peripheral blood leukocytes in a group of opiate abusers compared with those in healthy individuals. Met...
متن کاملA New Framework for Distributed Multivariate Feature Selection
Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR featu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007